-
Notifications
You must be signed in to change notification settings - Fork 713
[#9429] improvement(core): Use join to improve query performance on table, topic and so on. #9430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e, catalog, schema, and table tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces performance optimizations for table metadata queries through database indexing and a new query method, plus adds Caffeine-based caching for Metalake entities. However, there are several critical issues that must be addressed before merging.
Key changes:
- Added composite indexes on
(name, deleted_at)columns across metalake, catalog, schema, and table metadata tables - Introduced
selectTableByFullQualifiedNamemethod to query tables using fully qualified names with JOIN operations - Added Caffeine cache for
BaseMetalakeobjects inMetalakeManagerwith automatic expiry and cleanup
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/h2/schema-1.1.0-h2.sql | Adds composite indexes on name and deleted_at columns; contains critical bug on line 66 |
| scripts/h2/upgrade-1.0.0-to-1.1.0-h2.sql | Migration script to add indexes for existing H2 databases |
| scripts/mysql/schema-1.1.0-mysql.sql | Adds composite indexes on name and deleted_at columns; contains critical SQL syntax error on line 83 |
| scripts/mysql/upgrade-1.0.0-to-1.1.0-mysql.sql | Migration script to add indexes for existing MySQL databases |
| scripts/postgresql/schema-1.1.0-postgresql.sql | Adds composite indexes on name and deleted_at columns |
| scripts/postgresql/upgrade-1.0.0-to-1.1.0-postgresql.sql | Migration script to add indexes for existing PostgreSQL databases |
| core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java | Implements Caffeine cache for metalakes; has error handling and resource cleanup issues |
| core/src/main/java/org/apache/gravitino/storage/relational/mapper/TableMetaMapper.java | Adds new mapper method for querying by fully qualified name |
| core/src/main/java/org/apache/gravitino/storage/relational/mapper/TableMetaSQLProviderFactory.java | Adds factory method for the new query |
| core/src/main/java/org/apache/gravitino/storage/relational/mapper/provider/base/TableMetaBaseSQLProvider.java | Implements JOIN-based query; contains critical SQL injection vulnerability |
| core/src/main/java/org/apache/gravitino/storage/relational/service/TableMetaService.java | Updates table lookup to use new optimized query method |
Comments suppressed due to low confidence (1)
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java:90
- The
close()method does not clean up themetalakeCacheor shut down the scheduled executor service used by the cache's scheduler. This can lead to resource leaks when the MetalakeManager is closed. AddmetalakeCache.invalidateAll()and shutdown the scheduler's executor service in the close method.
@Override
public void close() {
// do nothing
}
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/gravitino/storage/relational/service/TableMetaService.java
Show resolved
Hide resolved
...a/org/apache/gravitino/storage/relational/mapper/provider/base/TableMetaBaseSQLProvider.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/gravitino/metalake/MetalakeManager.java
Outdated
Show resolved
Hide resolved
...a/org/apache/gravitino/storage/relational/mapper/provider/base/TableMetaBaseSQLProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 33 out of 34 changed files in this pull request and generated 9 comments.
core/src/test/java/org/apache/gravitino/storage/TestEntityStorage.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/gravitino/storage/relational/service/SchemaMetaService.java
Show resolved
Hide resolved
core/src/main/java/org/apache/gravitino/storage/relational/service/FilesetMetaService.java
Outdated
Show resolved
Hide resolved
mchades
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no more comments.
would you like to review again?



What changes were proposed in this pull request?
This pull request introduces major performance and functionality improvements to the Metalake and table metadata management system. The most significant change is the addition of a Caffeine-based cache for Metalake entities, which reduces redundant storage access and improves efficiency. Additionally, it adds a new method for querying tables by their fully qualified names and optimizes database schema indexing for faster lookups.
Metalake caching and management improvements
metalakeCache) inMetalakeManagerto storeBaseMetalakeobjects, with automatic expiry and scheduled cleanup. This cache is now used inmetalakeInUse,loadMetalake, and is invalidated on alter, drop, enable, and disable operations to ensure consistency. [1] [2] [3] [4] [5] [6] [7] [8] [9]Table metadata querying enhancements
selectTableByFullQualifiedNametoTableMetaMapper, its SQL provider, and the service layer, allowing retrieval of table metadata using metalake, catalog, schema, and table names directly, improving query flexibility and performance. [1] [2] [3] [4] [5]Database schema and indexing optimizations
idx_name_da) on themetalake_name,catalog_name,schema_name, andtable_namecolumns (withdeleted_at) in both H2 and MySQL schema scripts, significantly improving query performance for lookups by name. [1] [2] [3] [4] [5] [6] [7] [8] [9]Why are the changes needed?
To improve performance
Fix: #9429
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
Existing test